fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258
fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258Zelys-DFKH wants to merge 2 commits into
Conversation
|
Thanks for working on this @Zelys-DFKH — this addresses a real pain point. I filed the original issue (#2240) and have been thinking about the design here, especially considering future graph looping support (revisiting previous nodes via edge conditions). Proposal: Introduce two distinct statuses —
|
|
Agreed on SKIPPED vs COMPLETED. Updated; your analysis of the header-leak risk in
SKIPPED nodes live in
CANCELLED (abort-branch semantics) is a separate concern; I'll open an issue. |
Replace Status.COMPLETED + RuntimeError with Status.SKIPPED + None for
nodes bypassed via cancel_node. A skipped node never ran and should not
be marked completed; downstream readiness checks treat SKIPPED the same
as COMPLETED by keeping skipped nodes in completed_nodes.
- Add Status.SKIPPED enum value
- NodeResult.result accepts None; to_dict/from_dict round-trip via
{"type": "skipped"}; get_agent_results() returns [] for None
- _build_node_input filters SKIPPED deps before building
dependency_results, so downstream nodes receive only the original
task with no orphaned "From node_x:" header
- _from_dict restores Status.SKIPPED on node.execution_status via
the persisted NodeResult.status rather than blindly setting COMPLETED
- GraphResult gains skipped_nodes counter; completed_nodes now counts
only nodes that actually ran
- reset_on_revisit correctly clears SKIPPED nodes for loop re-entry
(they are already in completed_nodes)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Opened #2401 to track CANCELLED (abort-branch semantics). It also surfaces the |
Description
When
cancel_nodeis set in aBeforeNodeCallEventhook,_execute_noderaisedRuntimeErrorafter emittingMultiAgentNodeCancelEvent, which propagated through_execute_nodes_paralleland killed the graph. Downstream nodes never ran, the overall status becameFAILED, and interrupt-and-resume workflows had no recovery path.The fix introduces
Status.SKIPPED. The bypassed node recordsresult=Noneandstatus=Status.SKIPPED, lands incompleted_nodesso downstream readiness checks are satisfied, and returns cleanly. UsingStatus.COMPLETEDhere would be wrong: a node that never ran is not completed, and conflating the two makes it impossible to distinguish "ran successfully" from "intentionally bypassed" in loop control and observability tools.Three fixes came alongside:
_build_node_inputfilters SKIPPED deps before buildingdependency_results, so when all upstream deps are skipped the downstream node receives the original task with no orphaned"From node_x:"header_from_dictreads the persistedNodeResult.statusto restoreStatus.SKIPPEDcorrectly on resume, rather than hardcodingStatus.COMPLETEDfor everything incompleted_nodesGraphResultgainsskipped_nodes: int;completed_nodesnow counts only nodes that actually ranAlso opened #2401 for the complementary CANCELLED semantic (abort-branch, downstream does not run). That issue raises the
cancel_node→skip_nodenaming question; if maintainers prefer Option A, the rename can happen in this PR before it merges.Related Issues
Resolves #2240
Documentation PR
N/A
Type of Change
Bug fix
Testing
Ran
hatch run prepare. Updatedtest_graph_cancel_node(parametrized ×2) to assertStatus.SKIPPEDon bothNodeResult.statusandnode.execution_status. Updatedtest_graph_cancel_node_downstream_executesto assert: downstream node executes (step_b.stream_async.assert_called_once()), step_b receives only the original task with no orphaned header,graph_result.skipped_nodes == 1, andgraph_result.completed_nodes == 1.hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.